Recognizing Textual Entailment in Non-english Text via Automatic Translation into English
نویسندگان
چکیده
We show that a task that typically involves rather deep semantic processing of text—being recognizing textual entailment our case study—can be successfully solved without any tools at all specific for the language of the texts on which the task is performed. Instead, we automatically translate the text into English using a standard machine translation system, and then perform all linguistic processing, including syntactic and semantic levels, using only English language linguistic tools. In this case study we use Italian annotated data. Textual entailment is a relation between two texts. To detect it, we use various measures, which allow us to make entailment decision in the two-way classification task (YES / NO). We set up various heuristics and measures for evaluating the entailment between two texts based on lexical relations. To make entailment judgments, the system applies named entity recognition module, chunking, partof-speech tagging, n-grams, and text similarity modules to both texts, all those modules being for English and not for Italian. Rules have been developed to perform the two-way entailment classification. Our system makes entailment judgments basing on the entailment scores for the text pairs. The system was evaluated on Italian textual entailment data sets: we trained our system on Italian development datasets using the WEKA machine learning toolset and tested it on Italian test data sets. The accuracy of our system on the development corpus is 0.525 and on the test corpus is 0.66, which is a good result given that no Italian-specific linguistic information was used.
منابع مشابه
Using Recognizing Textual Entailment as a Core Engine for Answer Validation
This paper is about our approach to answer validation, which centered by a Recognizing Textual Entailment (RTE) core engine. We first combined the question and the answer into Hypothesis (H) and view the document as Text (T); then, we used our RTE system to check whether the entailment relation holds between them. Our system was evaluated on the Answer Validation Exercise (AVE) task and achieve...
متن کاملLIMSIILES: Basic English Substitution for Student Answer Assessment at SemEval 2013
In this paper, we describe a method for assessing student answers, modeled as a paraphrase identification problem, based on substitution by Basic English variants. Basic English paraphrases are acquired from the Simple English Wiktionary. Substitutions are applied both on reference answers and student answers in order to reduce the diversity of their vocabulary and map them to a common vocabula...
متن کاملJU_CSE_NLP: Language Independent Cross-lingual Textual Entailment System
This article presents the experiments carried out at Jadavpur University as part of the participation in Cross-lingual Textual Entailment for Content Synchronization (CLTE) of task 8 @ Semantic Evaluation Exercises (SemEval-2012). The work explores cross-lingual textual entailment as a relation between two texts in different languages and proposes different measures for entailment decision in a...
متن کاملSystemic Functional Linguistics as a Tool of Text Analysis for Translation
Translation, ipso facto, is an understanding and a transferal of meaning from one language into another. Therefore, it may be fitting to conclude that a suitable semantic theory should underpin any attempt to that end. This paper advocates implementing Systemic Functional Linguistics (henceforth SFL) which subscribes to a view of language as a "meaning-potential". In fact, Halliday and Matthies...
متن کاملMultiple Alternative Sentence Compressions and Word-Pair Antonymy for Automatic Text Summarization and Recognizing Textual Entailment
The University of Maryland participated in three tasks organized by the Text Analysis Conference 2008 (TAC 2008): (1) the update task of text summarization; (2) the opinion task of text summarization; and (3) recognizing textual entailment (RTE). At the heart of our summarization system is Trimmer, which generates multiple alternative compressed versions of the source sentences that act as cand...
متن کامل